AITopics | end-to-end sign language translation

Collaborating Authors

end-to-end sign language translation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cross-modality Data Augmentation for End-to-End Sign Language Translation

Ye, Jinhui, Jiao, Wenxiang, Wang, Xing, Tu, Zhaopeng, Xiong, Hui

arXiv.org Artificial IntelligenceOct-18-2023

End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations. It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data. To tackle these challenges, we propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation (i.e. video-to-text) by exploiting pseudo gloss-text pairs from the sign gloss translation model. Specifically, XmDA consists of two key components, namely, cross-modality mix-up and cross-modality knowledge distillation. The former explicitly encourages the alignment between sign video features and gloss embeddings to bridge the modality gap. The latter utilizes the generation knowledge from gloss-to-text teacher models to guide the spoken language text generation. Experimental results on two widely used SLT datasets, i.e., PHOENIX-2014T and CSL-Daily, demonstrate that the proposed XmDA framework significantly and consistently outperforms the baseline models. Extensive analyses confirm our claim that XmDA enhances spoken language text generation by reducing the representation distance between videos and texts, as well as improving the processing of low-frequency words and long sentences.

cross-modality data augmentation, end-to-end sign language translation

arXiv.org Artificial Intelligence

2305.11096

Genre: Research Report (0.40)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.80)

Add feedback

A New Dataset for End-to-End Sign Language Translation: The Greek Elementary School Dataset

Voskou, Andreas, Panousis, Konstantinos P., Partaourides, Harris, Tolias, Kyriakos, Chatzis, Sotirios

arXiv.org Artificial IntelligenceOct-7-2023

Automatic Sign Language Translation (SLT) is a research avenue of great societal impact. End-to-End SLT facilitates the interaction of Hard-of-Hearing (HoH) with hearing people, thus improving their social life and opportunities for participation in social life. However, research within this frame of reference is still in its infancy, and current resources are particularly limited. Existing SLT methods are either of low translation ability or are trained and evaluated on datasets of restricted vocabulary and questionable real-world value. A characteristic example is Phoenix2014T benchmark dataset, which only covers weather forecasts in German Sign Language. To address this shortage of resources, we introduce a newly constructed collection of 29653 Greek Sign Language video-translation pairs which is based on the official syllabus of Greek Elementary School. Our dataset covers a wide range of subjects. We use this novel dataset to train recent state-of-the-art Transformer-based methods widely used in SLT research. Our results demonstrate the potential of our introduced dataset to advance SLT research by offering a favourable balance between usability and real-world value.

dataset, end-to-end sign language translation, greek elementary school dataset, (1 more...)

arXiv.org Artificial Intelligence

2310.04753

Genre: Research Report > New Finding (0.53)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Setting > K-12 Education (0.60)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback